An Analysis of California Political Contributions to the 2012 Presidential Election by David Pankiewicz

In this analysis, I looked to see what were some of the common trends in regard to amounts of donations and total money donated based on other factors in the data.

First, I wanted to get a basic understanding of my dataset:
* How many observations am I working with?
* What types of data do I have and what are some basic summaries about this data?

## [1] 908181     23
##  [1] "committee.id"        "candidate.id"        "candidate"          
##  [4] "name"                "city"                "state"              
##  [7] "zip"                 "employer"            "occupation"         
## [10] "amount"              "date"                "receipt.description"
## [13] "memo.code"           "memo.text"           "form.type"          
## [16] "file.number"         "transaction.id"      "election.type"      
## [19] "democrat"            "democrat.amount"     "republican"         
## [22] "republican.amount"   "all.donations"
## 'data.frame':    908181 obs. of  23 variables:
##  $ committee.id       : Factor w/ 15 levels "C00410118","C00431171",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ candidate.id       : Factor w/ 14 levels "P00003608","P20002523",..: 13 13 13 13 13 13 13 13 13 13 ...
##  $ candidate          : Factor w/ 2 levels "Obama, Barack",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ name               : Factor w/ 199431 levels "..., ANTHONY",..: 15658 86905 15322 173182 123110 121194 117757 22526 101066 181746 ...
##  $ city               : Factor w/ 2497 levels "","@GMAIL.COM",..: 2011 1116 1766 495 2275 566 1983 1134 1574 2024 ...
##  $ state              : Factor w/ 1 level "CA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ zip                : Factor w/ 115949 levels "","*9040","*9136",..: 98439 17525 63152 9576 15040 19192 41578 67294 62008 13821 ...
##  $ employer           : Factor w/ 68255 levels ""," FAIRCHILD SEMI",..: 48612 48612 48612 5206 13650 48612 41612 23615 48612 53064 ...
##  $ occupation         : Factor w/ 31407 levels ""," BUILDING TECH",..: 23369 23369 23369 18321 26086 23369 23369 15850 23369 31253 ...
##  $ amount             : num  10 500 25 30 100 112 250 500 25 250 ...
##  $ date               : Date, format: "2011-09-27" "2011-08-29" ...
##  $ receipt.description: Factor w/ 37 levels "","ATTRIBUTION TO PARTNERS REQUESTED / REDESIGNATION REQUESTED",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo.code          : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo.text          : Factor w/ 261 levels "","*","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ form.type          : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ file.number        : int  756218 756218 756218 756218 756218 756218 756218 756218 756218 756218 ...
##  $ transaction.id     : Factor w/ 870697 levels "0000002","0000004-0001",..: 41504 37447 40383 42192 40782 34292 37792 35669 40214 41799 ...
##  $ election.type      : Factor w/ 8 levels "","G2008","G2012",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ democrat           : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ democrat.amount    : num  10 500 25 30 100 112 250 500 25 250 ...
##  $ republican         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ republican.amount  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ all.donations      : num  1 1 1 1 1 1 1 1 1 1 ...
##     committee.id       candidate.id            candidate     
##  C00431445:717147   P80003338:717147   Obama, Barack:717147  
##  C00431171:191034   P80003353:191034   Romney, Mitt :191034  
##  C00410118:     0   P00003608:     0                         
##  C00493692:     0   P20002523:     0                         
##  C00494393:     0   P20002556:     0                         
##  C00495622:     0   P20002671:     0                         
##  (Other)  :     0   (Other)  :     0                         
##                         name                   city        state      
##  HENRY, MICHELLE          :   272   LOS ANGELES  : 68063   CA:908181  
##  GROVER-MCKAY, MALEAH     :   245   SAN FRANCISCO: 57619              
##  BRYANT, REGINA           :   223   SAN DIEGO    : 35024              
##  RACCIO, KYLE             :   178   OAKLAND      : 23733              
##  BEEGHLY, CHRISTINA D. MS.:   167   SAN JOSE     : 18159              
##  REYNOLDS, MARK           :   166   BERKELEY     : 16876              
##  (Other)                  :906930   (Other)      :688707              
##       zip                                           employer     
##  94114  :  2792   RETIRED                               :182245  
##  94110  :  2515   SELF-EMPLOYED                         :131549  
##  94611  :  2180   NOT EMPLOYED                          : 66741  
##  94117  :  2112   INFORMATION REQUESTED PER BEST EFFORTS: 19852  
##  90046  :  1676   INFORMATION REQUESTED                 : 19138  
##  94941  :  1622   (Other)                               :488441  
##  (Other):895284   NA's                                  :   215  
##                                   occupation         amount        
##  RETIRED                               :203434   Min.   :    0.09  
##  ATTORNEY                              : 30840   1st Qu.:   25.00  
##  HOMEMAKER                             : 21084   Median :   50.00  
##  INFORMATION REQUESTED PER BEST EFFORTS: 18989   Mean   :  190.16  
##  TEACHER                               : 18464   3rd Qu.:  100.00  
##  (Other)                               :615313   Max.   :30000.00  
##  NA's                                  :    57                     
##       date                                         receipt.description
##  Min.   :2011-04-04                                          :906365  
##  1st Qu.:2012-07-16   REDESIGNATION FROM PRIMARY             :   647  
##  Median :2012-09-17   REATTRIBUTION / REDESIGNATION REQUESTED:   376  
##  Mean   :2012-08-10   REATTRIBUTION FROM SPOUSE              :   328  
##  3rd Qu.:2012-10-17   SEE REATTRIBUTION                      :   227  
##  Max.   :2012-12-31   REDESIGNATION FROM GENERAL             :   194  
##                       (Other)                                :    44  
##  memo.code                                memo.text      form.type     
##   :692330                                      :688882   SA17A:693673  
##  X:215851   * OBAMA VICTORY FUND 2012          :135822   SA18 :214508  
##             TRANSFER FROM ROMNEY VICTORY INC.  : 77912   SB28A:     0  
##             * EARMARKED CONTRIBUTION: SEE BELOW:  1189                 
##             *                                  :   665                 
##             REDESIGNATION FROM PRIMARY         :   647                 
##             (Other)                            :  3064                 
##   file.number          transaction.id   election.type       democrat     
##  Min.   :756214   C19560355   :     2   G2012  :529269   Min.   :0.0000  
##  1st Qu.:810684   SA17.1000048:     2   P2012  :378261   1st Qu.:1.0000  
##  Median :821325   SA17.1000101:     2   O2012  :   651   Median :1.0000  
##  Mean   :830860   SA17.1000103:     2          :     0   Mean   :0.7897  
##  3rd Qu.:842943   SA17.1000138:     2   G2008  :     0   3rd Qu.:1.0000  
##  Max.   :944828   SA17.1000144:     2   P      :     0   Max.   :1.0000  
##                   (Other)     :908169   (Other):     0                   
##  democrat.amount     republican     republican.amount  all.donations
##  Min.   :    0.0   Min.   :0.0000   Min.   :    0.00   Min.   :1    
##  1st Qu.:   10.0   1st Qu.:0.0000   1st Qu.:    0.00   1st Qu.:1    
##  Median :   35.0   Median :0.0000   Median :    0.00   Median :1    
##  Mean   :  103.1   Mean   :0.2103   Mean   :   87.05   Mean   :1    
##  3rd Qu.:  100.0   3rd Qu.:0.0000   3rd Qu.:    0.00   3rd Qu.:1    
##  Max.   :25800.0   Max.   :1.0000   Max.   :30000.00   Max.   :1    
## 

Based on this data summary alone, there is a lot of information about the dataset.
* Most donations go to Obama.
* Not surprisingly, the cities that send the most donations are the most populated cities.
* Many are retired or self-employed (more on this later).
* Of those employed by others, the most common occupation is attorney.
* The median amount of donation is $50 with a right skewed distribution.
* The median date is Oct 14th, 2012. Thus, 50% of all donations occur in the last month leading up to the election on Nov 6th, 2012.

Next, I wanted to see what are the most common values of my categorical variables.

Histogram of Most Common Occurrences

Nothing too suprising here. Generally, the cities and zips with the most donations tend to be the ones with the most amount of people.

At first glance, this histogram seems strange. Do retired people really donate that much more often than everyone else? There is some credence to this, as older citizens tend to be the most politically active in terms of voting. However, this trend is better explained by the fact that retired people are probably not that different than other contributors, except for the fact that they are older and no longer work. In other words, ex-teachers and ex-attorneys are all lumped into one category, “retired”. Thus, in reviewing the most commonly held occupations, it makes more sense to exclude retired people. Additionally, let’s remove those who would not disclose their occupation.

Even from these results, it’s still hard to draw any particular conclusions. We’d need data on the distribution of occupations in California to tell whether these particular occupations are more likely to contribute, or are just the most common amongst the population at large.

A few types of “employers” skew the distribution a bit. Let’s remove non-organizational employers…

The most common employers tend to be the largest employers within the state of California. Again, we’d need to know more information about employer distributions to know if it any particular employer over or underindexes on rate of contribution.

Analysis of Amount

In analyzing amount, I wanted to know what the distributions looked like how they varied between parties.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.09    25.00    50.00   190.20   100.00 30000.00

Clearly, there are some large outliers in the values of amounts. Given that the initial histogram was right skewed, I first decided to take a log transformation to see whether it would then follow a normal distribution. It does look more normal, but not quite as perfectly as one might hope.

Next, I asked, is there a difference in distributions in donations by party?

Based on these graphs alone, it’s difficult to tell to precise details. Instead, I broke these out with boxplots and summary statistics.

Summary of Amounts by Party

## $`Obama, Barack`
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.09    20.00    50.00   130.60   100.00 25800.00 
## 
## $`Romney, Mitt`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    50.0   100.0   413.9   285.4 30000.0

After zooming in our graph, it becomes very clear that the distribution of amounts is noticeably different between donations to Obama vs. Romney. The stark differences can be seen exactly in the numerical summary above. This would explain why although Romney has far fewer total contributors, his total amount of money raised is not as comparably low.

## Obama, Barack  Romney, Mitt 
##      93638609      79060174

Profession Level Analysis

There are a few noticeable trends in occupation data. First is that expected occupation income seems to related with amount donated. For example, physicians and attorneys tend to donate more per person than teachers. The exception to this would be homeakers, but this makes sense upon further consideration. If you’re a homemaker, your spouse probably must make enough money to support the entire household. Thus, households with a homemaker may have higher incomes on average, and thus be able to donate money from both partners at higher amounts.

It is difficult to comment on the frequency of attorneys and physicians without outside data outlining their relative frequencies in the California population, but I’m willing to guess that they donate more often than average because specific political issues are likely to be very relevant to them given their field of work.

## Warning: Removed 7 rows containing missing values (position_stack).

Once agian, we see that candidate has a strong relationship across the top occupations sampled. In these charts, we also see that professors tend to lean more towards donating to the democratic candidate, whereas occupations related to business functions (sales, manager, president, real estate, etc.) tend to lean toward donating to the republican canidate.

City Level Analysis

Given what we know about amounts across all of California, I looked to explore city-level data.

Some of this chart is as expected. The largest cities like Los Angeles and San Francisco donate the most money just through their sheer population size. But some smaller cities also make this list. Let’s separate out the effects of population by looking at boxplots of the amounts by cities…

Clearly, some cities (like Newport Beach) donate at a rate far above those of other cities, hence why it makes the list despite only having a population of rouhly 85,000 (as opposed to Sacramento, population of approximately 480,000).

Next, I asked: What are the most partisan cities in terms of dollars donated and number of donations?

Clearly, city and candidate can have a strong relationship to amount.

Finally, I wanted to look at: what are the most partisan cities by percentage of contributions and by percentage of total money contributed?

Note that for the analysis on cities above, a mimimum threshold was set at 50 donations for a city to be included in partisan city analysis.

Based on city data, it’s clear that Republicans do best in small, affluent communities in Southern California. Democrats do best in Berkeley and small cities nearby in the Bay Area. A notable exception to this trend is Hollywood, a Democrat stronghold.